Members
Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Graph-based Knowledge Representation

SPARQL Based Pretty Printing Language

Participant : Olivier Corby.

We have designed SPARQL Template, a pretty-printing rule language for RDF graphs. It enables to pretty print RDF graphs representing Abstact Syntax Trees of languages such as SPIN or OWL RDF syntax. We have implemented a pretty printing engine that interprets SPARQL Template.

An example of template for a OWL ”someValuesFrom” statement is shown below. The SPARQL 1.1 ”where” part specifies the conditions to apply the rule on a focus node ”?in”. The template part specifies the result of the pretty print of the focus node. Variables in the template part are recursively replaced by the result of their pretty print.

template {

  "someValuesFrom(" ?p " " ?c ")"

}

where {

  ?in a owl:Restriction ;

    owl:onProperty ?p ;

    owl:someValuesFrom ?c

}

We have introduced named templates that are called explicitely using a ”kg:template” extension function.

The pretty printing language and engine have been validated on five RDF AST (ftp://ftp-sop.inria.fr/wimmics/soft/pprint ): SPIN, OWL 2, SQL, Turtle and a mockup of mathematical expressions pretty printed into Latex. The SPIN pretty printer is used in the PhD Thesis of Oumy Seye on "Rules for the Web of Data" and the SQL pretty printer is used in the PhD Thesis of Corentin Follenfant on "Usage semantics of analytics and Business Intelligence tools".

Federated Semantic Data Query

Participants : Olivier Corby, Alban Gaignard.

Another activity of the team addresses the data explosion challenges faced in e-Science. Semantic Web technologies are well adopted to represent the knowledge associated to both e-Science data and processing tools. A PhD thesis  [76] , addressing the distributed knowledge production and sharing in collaborative e-Science platforms, has successfully been defended this year. Moreover, we have been participating in the organization of the second edition of the CrEDIBLE workshop (http://credible.i3s.unice.fr ), gathering international experts to discuss the challenges of federating distributed biomedical imaging data and knowledge.

In this area, the main scientific results are (i) a software architecture for transparently querying multiple data sources through the SPARQL language  [73] , (ii) a set of querying strategies and optimizations dedicated to limit the cost of distributed query processing, while still considering enough expressivity (full SPARQL 1.1 support, including named graphs, property path expressions, optional, aggregates, etc.).

Performance-oriented experiments have been conducted on the Grid'5000 distributed computing infrastructure to compare our approach with state-of-the-art engines such as FedX  [85] , Splendid  [77] , or DARQ  [82] . Experiments, based on the FedBench benchmark  [84] show performances between DARQ, Splendid, and FedX, while still high expressivity.

Since distributed query processing lead to complex and costly processes, we started to collect provenance information which opens interesting perspectives towards enhanced trust and reproducibility in Linked Data querying and reasoning.

These distributed query processing strategies have been implemented and integrated into Corese through two main components, namely a data source federator, and a data source endpoint. A prototype Web application has also been developed to demonstrate our approach. End-users can configure and launch distributed SPARQL querying and finally visualize SPARQL results and their associated provenance.

Rules for the Web of Data

Participants : Olivier Corby, Catherine Faron Zucker, Oumy Seye.

This work takes place in the PhD Thesis of Oumy Seye.

The objective of this year is to foster knowledge reuse on the Web based on the principles of Linked Data. Our approach is to consider rule bases like data sources that can be published, shared and queried as Linked Data, thus enabling the selection and reuse of relevant and useful shared rules in any particular context or application. We propose to select rules by querying either metadata annotating rules, rules content or both. To make rules content queryable, we use RDF representations of SPARQL rules with the SPIN format (http://www.w3.org/Submission/spin-overview/ ).

This idea joins the principles of the Semantic Web that encourages the sharing and reuse of knowledge. We used the SPIN syntax (which allows the representation of a SPARQL query in RDF ) obtained with the SPIN pretty printer of Corese. We have subsequently been able to select rules of interest with Corese. The proposal enables to search rules based on their content. This allows us to help users extract relevant set of rules for their data, and thus leverage more easily shared rules. This idea can be used to build a search engine for rules on the Web or a tool for automatically connect rules with semantic data.

In the remainder of this work, we will focus on updating harvested rules. A poster on this work was presented for the GLC pole day July 8, and at the summer school ESWC September 2.

Semantic Web and Business Intelligence

Participants : Corentin Follenfant, Olivier Corby, Fabien Gandon.

This PhD Thesis is done with a CIFRE industrial grant from SAP Research.

The bilateral contract with SAP aims at converging Semantic Web and Business Intelligence through a framework applying the read/write Web principles to the business knowledge carried within Business Intelligence reports. These reports often provide a dynamic view upon numerical data from various enterprise sources, mainly relational databases. Reports are authored with a complex process that can be reduced to writing, directly or through different layers of user interfaces, SQL queries that will query the sources and feed the dynamic reports. In order to simplify the query authoring process, complementary approaches are envisioned.

Our approach proposes to model the queries as knowledge through their abstract syntax trees (ASTs) with Semantic Web tools, query and manipulate them through appropriate standards, respectively RDF/S and SPARQL. Indeed RDF enables us to model the actual structure of the ASTs by integrating the knowledge related to syntax and semantics of the SQL queries: types can be captured with XML Schema Datatypes, while more specific business knowledge can also be designed according to the source business models and annotate various entities referenced within the SQL queries. Regarding the query and manipulation part, a library of SPARQL queries was designed to perform generic AST manipulation (generic from a DSL perspective), and is usable to search, extract, edit, prune or graft parts of RDF-modelled ASTs.

While this year was mostly dedicated to manuscript writing, additional experiments were run to demonstrate the validity of our model: a large set of ANSI SQL queries generated with a TPC-DS benchmark was converted to its RDF representation. Inversely, a generic pretty printer system developed into the Corese engine was validated by the internship of Abdoul Macina who developed a set of rules to have the pretty printer turn RDF-modelled SQL queries back to their concrete syntactic form. This enables iterative query design by leveraging AST patterns rather than manually editing brute syntax.